Overview

Dataset statistics

Number of variables20
Number of observations214679
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory100.7 MiB
Average record size in memory491.9 B

Variable types

Numeric12
DateTime1
Categorical7

Warnings

Company has constant value "Yellow Cab" Constant
df_index is highly correlated with Transaction ID and 1 other fieldsHigh correlation
Transaction ID is highly correlated with df_index and 1 other fieldsHigh correlation
KM Travelled is highly correlated with Cost of TripHigh correlation
Cost of Trip is highly correlated with KM TravelledHigh correlation
Population is highly correlated with UsersHigh correlation
Users is highly correlated with PopulationHigh correlation
Year is highly correlated with df_index and 1 other fieldsHigh correlation
Company is highly correlated with Year and 5 other fieldsHigh correlation
Year is highly correlated with CompanyHigh correlation
City is highly correlated with CompanyHigh correlation
Holiday is highly correlated with CompanyHigh correlation
Day of Week is highly correlated with CompanyHigh correlation
Payment_Mode is highly correlated with CompanyHigh correlation
Gender is highly correlated with CompanyHigh correlation
df_index has unique values Unique
Transaction ID has unique values Unique

Reproduction

Analysis started2021-02-27 19:44:25.389760
Analysis finished2021-02-27 19:45:40.595617
Duration1 minute and 15.21 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct214679
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179235.1597
Minimum0
Maximum359391
Zeros1
Zeros (%)< 0.1%
Memory size1.6 MiB
2021-02-27T23:15:40.851933image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile17282.9
Q190006.5
median179319
Q3268392
95-th percentile341682.1
Maximum359391
Range359391
Interquartile range (IQR)178385.5

Descriptive statistics

Standard deviation103834.9944
Coefficient of variation (CV)0.5793226875
Kurtosis-1.194441126
Mean179235.1597
Median Absolute Deviation (MAD)89187
Skewness0.003038595775
Sum3.847802486 × 1010
Variance1.078170607 × 1010
MonotocityStrictly increasing
2021-02-27T23:15:41.040429image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
61411
 
< 0.1%
3512301
 
< 0.1%
971461
 
< 0.1%
910011
 
< 0.1%
930481
 
< 0.1%
725661
 
< 0.1%
787071
 
< 0.1%
1196631
 
< 0.1%
1176121
 
< 0.1%
1298981
 
< 0.1%
Other values (214669)214669
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
41
< 0.1%
51
< 0.1%
71
< 0.1%
101
< 0.1%
111
< 0.1%
121
< 0.1%
131
< 0.1%
161
< 0.1%
ValueCountFrequency (%)
3593911
< 0.1%
3593821
< 0.1%
3593781
< 0.1%
3593771
< 0.1%
3593761
< 0.1%
3593751
< 0.1%
3593741
< 0.1%
3593731
< 0.1%
3593721
< 0.1%
3593711
< 0.1%

Transaction ID
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct214679
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10220316.1
Minimum10000384
Maximum10439521
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:41.351595image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum10000384
5-th percentile10021446.9
Q110111083.5
median10220504
Q310329683.5
95-th percentile10418340.1
Maximum10439521
Range439137
Interquartile range (IQR)218600

Descriptive statistics

Standard deviation126960.4235
Coefficient of variation (CV)0.01242235781
Kurtosis-1.192876614
Mean10220316.1
Median Absolute Deviation (MAD)109300
Skewness0.003114962661
Sum2.19408724 × 1012
Variance1.611894914 × 1010
MonotocityNot monotonic
2021-02-27T23:15:41.550066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
103860661
 
< 0.1%
102282971
 
< 0.1%
102360091
 
< 0.1%
103547661
 
< 0.1%
102940581
 
< 0.1%
102469381
 
< 0.1%
102281381
 
< 0.1%
102394391
 
< 0.1%
102940451
 
< 0.1%
100776401
 
< 0.1%
Other values (214669)214669
> 99.9%
ValueCountFrequency (%)
100003841
< 0.1%
100003851
< 0.1%
100003861
< 0.1%
100003871
< 0.1%
100003881
< 0.1%
100003891
< 0.1%
100003901
< 0.1%
100003911
< 0.1%
100003921
< 0.1%
100003931
< 0.1%
ValueCountFrequency (%)
104395211
< 0.1%
104395141
< 0.1%
104395051
< 0.1%
104395041
< 0.1%
104394851
< 0.1%
104394511
< 0.1%
104394441
< 0.1%
104394371
< 0.1%
104394281
< 0.1%
104394271
< 0.1%
Distinct1095
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Minimum2016-01-02 00:00:00
Maximum2018-12-31 00:00:00
2021-02-27T23:15:41.768479image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:41.982906image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Company
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size13.7 MiB
Yellow Cab
214679 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters2146790
Distinct characters9
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYellow Cab
2nd rowYellow Cab
3rd rowYellow Cab
4th rowYellow Cab
5th rowYellow Cab
ValueCountFrequency (%)
Yellow Cab214679
100.0%
2021-02-27T23:15:42.329977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:15:42.431705image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
yellow214679
50.0%
cab214679
50.0%

Most occurring characters

ValueCountFrequency (%)
l429358
20.0%
Y214679
10.0%
e214679
10.0%
o214679
10.0%
w214679
10.0%
214679
10.0%
C214679
10.0%
a214679
10.0%
b214679
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1502753
70.0%
Uppercase Letter429358
 
20.0%
Space Separator214679
 
10.0%

Most frequent character per category

ValueCountFrequency (%)
l429358
28.6%
e214679
14.3%
o214679
14.3%
w214679
14.3%
a214679
14.3%
b214679
14.3%
ValueCountFrequency (%)
Y214679
50.0%
C214679
50.0%
ValueCountFrequency (%)
214679
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1932111
90.0%
Common214679
 
10.0%

Most frequent character per script

ValueCountFrequency (%)
l429358
22.2%
Y214679
11.1%
e214679
11.1%
o214679
11.1%
w214679
11.1%
C214679
11.1%
a214679
11.1%
b214679
11.1%
ValueCountFrequency (%)
214679
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2146790
100.0%

Most frequent character per block

ValueCountFrequency (%)
l429358
20.0%
Y214679
10.0%
e214679
10.0%
o214679
10.0%
w214679
10.0%
214679
10.0%
C214679
10.0%
a214679
10.0%
b214679
10.0%

City
Categorical

HIGH CORRELATION

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size13.9 MiB
NEW YORK NY
85918 
CHICAGO IL
47264 
LOS ANGELES CA
28168 
BOSTON MA
24506 
ATLANTA GA
 
5795
Other values (10)
23028 

Length

Max length14
Median length11
Mean length10.79549933
Min length8

Characters and Unicode

Total characters2317567
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBOSTON MA
2nd rowCHICAGO IL
3rd rowNEW YORK NY
4th rowLOS ANGELES CA
5th rowCHICAGO IL
ValueCountFrequency (%)
NEW YORK NY85918
40.0%
CHICAGO IL47264
22.0%
LOS ANGELES CA28168
 
13.1%
BOSTON MA24506
 
11.4%
ATLANTA GA5795
 
2.7%
DALLAS TX5637
 
2.6%
MIAMI FL4452
 
2.1%
AUSTIN TX3028
 
1.4%
ORANGE COUNTY2469
 
1.2%
DENVER CO2431
 
1.1%
Other values (5)5011
 
2.3%
2021-02-27T23:15:42.712954image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ny85918
15.8%
york85918
15.8%
new85918
15.8%
chicago47264
8.7%
il47264
8.7%
ca30179
 
5.5%
angeles28168
 
5.2%
los28168
 
5.2%
boston24506
 
4.5%
ma24506
 
4.5%
Other values (20)56613
10.4%

Most occurring characters

ValueCountFrequency (%)
329743
14.2%
N246251
10.6%
O220942
9.5%
A180564
 
7.8%
Y174305
 
7.5%
E153965
 
6.6%
C130640
 
5.6%
L127459
 
5.5%
I110438
 
4.8%
S93318
 
4.0%
Other values (15)549942
23.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1987824
85.8%
Space Separator329743
 
14.2%

Most frequent character per category

ValueCountFrequency (%)
N246251
12.4%
O220942
11.1%
A180564
 
9.1%
Y174305
 
8.8%
E153965
 
7.7%
C130640
 
6.6%
L127459
 
6.4%
I110438
 
5.6%
S93318
 
4.7%
R92482
 
4.7%
Other values (14)457460
23.0%
ValueCountFrequency (%)
329743
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1987824
85.8%
Common329743
 
14.2%

Most frequent character per script

ValueCountFrequency (%)
N246251
12.4%
O220942
11.1%
A180564
 
9.1%
Y174305
 
8.8%
E153965
 
7.7%
C130640
 
6.6%
L127459
 
6.4%
I110438
 
5.6%
S93318
 
4.7%
R92482
 
4.7%
Other values (14)457460
23.0%
ValueCountFrequency (%)
329743
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2317567
100.0%

Most frequent character per block

ValueCountFrequency (%)
329743
14.2%
N246251
10.6%
O220942
9.5%
A180564
 
7.8%
Y174305
 
7.5%
E153965
 
6.6%
C130640
 
5.6%
L127459
 
5.5%
I110438
 
4.8%
S93318
 
4.0%
Other values (15)549942
23.7%

KM Travelled
Real number (ℝ≥0)

HIGH CORRELATION

Distinct874
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.56776252
Minimum1.9
Maximum48
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:42.897460image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1.9
5-th percentile3.57
Q112
median22.47
Q332.98
95-th percentile42
Maximum48
Range46.1
Interquartile range (IQR)20.98

Descriptive statistics

Standard deviation12.23534095
Coefficient of variation (CV)0.542160125
Kurtosis-1.125975613
Mean22.56776252
Median Absolute Deviation (MAD)10.49
Skewness0.05375046895
Sum4844824.69
Variance149.7035681
MonotocityNot monotonic
2021-02-27T23:15:43.095930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33.6922
 
0.4%
22.8669
 
0.3%
24654
 
0.3%
35.7644
 
0.3%
37.44639
 
0.3%
16.8628
 
0.3%
39.6609
 
0.3%
28.08568
 
0.3%
21.85479
 
0.2%
19.2469
 
0.2%
Other values (864)208398
97.1%
ValueCountFrequency (%)
1.9213
0.1%
1.92226
0.1%
1.94206
0.1%
1.96219
0.1%
1.98234
0.1%
2224
0.1%
2.02210
0.1%
2.04223
0.1%
2.06188
0.1%
2.08225
0.1%
ValueCountFrequency (%)
48221
0.1%
47.6246
0.1%
47.2233
0.1%
46.8436
0.2%
46.41240
0.1%
46.4209
0.1%
46.02244
0.1%
46196
0.1%
45.63197
0.1%
45.6416
0.2%

Price Charged
Real number (ℝ≥0)

Distinct92003
Distinct (%)42.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean476.4425666
Minimum20.73
Maximum2048.03
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:43.592603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum20.73
5-th percentile71.71
Q1234.815
median439.86
Q3660.83
95-th percentile1040.581
Maximum2048.03
Range2027.3
Interquartile range (IQR)426.015

Descriptive statistics

Standard deviation300.6089791
Coefficient of variation (CV)0.6309448403
Kurtosis0.2016614755
Mean476.4425666
Median Absolute Deviation (MAD)212.1
Skewness0.7192370246
Sum102282213.8
Variance90365.75832
MonotocityNot monotonic
2021-02-27T23:15:43.789075image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
538.4412
 
< 0.1%
341.1612
 
< 0.1%
85.9711
 
< 0.1%
196.0311
 
< 0.1%
577.1611
 
< 0.1%
185.3811
 
< 0.1%
475.0211
 
< 0.1%
79.3811
 
< 0.1%
625.5811
 
< 0.1%
261.6610
 
< 0.1%
Other values (91993)214568
99.9%
ValueCountFrequency (%)
20.731
< 0.1%
221
< 0.1%
22.041
< 0.1%
22.111
< 0.1%
22.371
< 0.1%
22.421
< 0.1%
22.521
< 0.1%
22.591
< 0.1%
22.791
< 0.1%
22.811
< 0.1%
ValueCountFrequency (%)
2048.031
< 0.1%
2016.71
< 0.1%
2013.951
< 0.1%
1993.831
< 0.1%
1981.051
< 0.1%
1978.791
< 0.1%
1957.11
< 0.1%
1947.911
< 0.1%
1925.921
< 0.1%
1920.591
< 0.1%

Cost of Trip
Real number (ℝ≥0)

HIGH CORRELATION

Distinct9808
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean297.9044013
Minimum22.8
Maximum691.2
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:44.036413image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum22.8
5-th percentile47.6064
Q1158.4
median295.608
Q3432.5616
95-th percentile558.144
Maximum691.2
Range668.4
Interquartile range (IQR)274.1616

Descriptive statistics

Standard deviation162.5627971
Coefficient of variation (CV)0.5456877992
Kurtosis-1.076249653
Mean297.9044013
Median Absolute Deviation (MAD)137.0256
Skewness0.08572898348
Sum63953818.96
Variance26426.663
MonotocityNot monotonic
2021-02-27T23:15:44.318661image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
479.808139
 
0.1%
471.744129
 
0.1%
362.88118
 
0.1%
488.376115
 
0.1%
370.656112
 
0.1%
403.2108
 
0.1%
423.36108
 
0.1%
435.456106
 
< 0.1%
287.28106
 
< 0.1%
241.92105
 
< 0.1%
Other values (9798)213533
99.5%
ValueCountFrequency (%)
22.812
< 0.1%
23.0289
< 0.1%
23.0410
< 0.1%
23.2567
< 0.1%
23.270411
< 0.1%
23.285
 
< 0.1%
23.48412
< 0.1%
23.500814
< 0.1%
23.512810
< 0.1%
23.529
< 0.1%
ValueCountFrequency (%)
691.27
 
< 0.1%
685.4424
< 0.1%
679.7289
 
< 0.1%
679.6824
< 0.1%
674.01628
< 0.1%
673.9228
< 0.1%
668.35212
 
< 0.1%
668.30442
< 0.1%
668.1618
< 0.1%
662.734815
 
< 0.1%

Customer ID
Real number (ℝ≥0)

Distinct28458
Distinct (%)13.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12832.31576
Minimum1
Maximum60000
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:44.568990image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile380
Q11876
median4372
Q38966
95-th percentile58682
Maximum60000
Range59999
Interquartile range (IQR)7090

Descriptive statistics

Standard deviation18731.15845
Coefficient of variation (CV)1.459686529
Kurtosis1.403269138
Mean12832.31576
Median Absolute Deviation (MAD)3018
Skewness1.715258081
Sum2754828715
Variance350856297
MonotocityNot monotonic
2021-02-27T23:15:44.760478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
180347
 
< 0.1%
136047
 
< 0.1%
49447
 
< 0.1%
63646
 
< 0.1%
90345
 
< 0.1%
276645
 
< 0.1%
12645
 
< 0.1%
257744
 
< 0.1%
99244
 
< 0.1%
107044
 
< 0.1%
Other values (28448)214225
99.8%
ValueCountFrequency (%)
125
< 0.1%
236
< 0.1%
340
< 0.1%
425
< 0.1%
523
< 0.1%
623
< 0.1%
734
< 0.1%
829
< 0.1%
935
< 0.1%
1021
< 0.1%
ValueCountFrequency (%)
6000014
< 0.1%
599996
< 0.1%
599986
< 0.1%
599978
< 0.1%
599964
 
< 0.1%
5999511
< 0.1%
5999410
< 0.1%
5999312
< 0.1%
599928
< 0.1%
599917
< 0.1%

Payment_Mode
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.5 MiB
Card
128792 
Cash
85887 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters858716
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCard
2nd rowCash
3rd rowCash
4th rowCash
5th rowCash
ValueCountFrequency (%)
Card128792
60.0%
Cash85887
40.0%
2021-02-27T23:15:45.119518image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:15:45.219253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
card128792
60.0%
cash85887
40.0%

Most occurring characters

ValueCountFrequency (%)
C214679
25.0%
a214679
25.0%
r128792
15.0%
d128792
15.0%
s85887
10.0%
h85887
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter644037
75.0%
Uppercase Letter214679
 
25.0%

Most frequent character per category

ValueCountFrequency (%)
a214679
33.3%
r128792
20.0%
d128792
20.0%
s85887
13.3%
h85887
13.3%
ValueCountFrequency (%)
C214679
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin858716
100.0%

Most frequent character per script

ValueCountFrequency (%)
C214679
25.0%
a214679
25.0%
r128792
15.0%
d128792
15.0%
s85887
10.0%
h85887
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII858716
100.0%

Most frequent character per block

ValueCountFrequency (%)
C214679
25.0%
a214679
25.0%
r128792
15.0%
d128792
15.0%
s85887
10.0%
h85887
10.0%

Gender
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.7 MiB
Male
125622 
Female
89057 

Length

Max length6
Median length4
Mean length4.829675935
Min length4

Characters and Unicode

Total characters1036830
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowMale
5th rowMale
ValueCountFrequency (%)
Male125622
58.5%
Female89057
41.5%
2021-02-27T23:15:45.541388image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:15:45.672041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
male125622
58.5%
female89057
41.5%

Most occurring characters

ValueCountFrequency (%)
e303736
29.3%
a214679
20.7%
l214679
20.7%
M125622
12.1%
F89057
 
8.6%
m89057
 
8.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter822151
79.3%
Uppercase Letter214679
 
20.7%

Most frequent character per category

ValueCountFrequency (%)
e303736
36.9%
a214679
26.1%
l214679
26.1%
m89057
 
10.8%
ValueCountFrequency (%)
M125622
58.5%
F89057
41.5%

Most occurring scripts

ValueCountFrequency (%)
Latin1036830
100.0%

Most frequent character per script

ValueCountFrequency (%)
e303736
29.3%
a214679
20.7%
l214679
20.7%
M125622
12.1%
F89057
 
8.6%
m89057
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1036830
100.0%

Most frequent character per block

ValueCountFrequency (%)
e303736
29.3%
a214679
20.7%
l214679
20.7%
M125622
12.1%
F89057
 
8.6%
m89057
 
8.6%

Age
Real number (ℝ≥0)

Distinct48
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.36739504
Minimum18
Maximum65
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:45.802707image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile19
Q125
median33
Q342
95-th percentile61
Maximum65
Range47
Interquartile range (IQR)17

Descriptive statistics

Standard deviation12.62347245
Coefficient of variation (CV)0.3569240096
Kurtosis-0.4762890592
Mean35.36739504
Median Absolute Deviation (MAD)8
Skewness0.6796008014
Sum7592637
Variance159.3520567
MonotocityNot monotonic
2021-02-27T23:15:46.000162image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
237580
 
3.5%
207373
 
3.4%
327256
 
3.4%
277112
 
3.3%
217088
 
3.3%
257078
 
3.3%
227070
 
3.3%
396990
 
3.3%
336988
 
3.3%
196978
 
3.3%
Other values (38)143166
66.7%
ValueCountFrequency (%)
186258
2.9%
196978
3.3%
207373
3.4%
217088
3.3%
227070
3.3%
237580
3.5%
246644
3.1%
257078
3.3%
266844
3.2%
277112
3.3%
ValueCountFrequency (%)
651992
0.9%
642365
1.1%
632183
1.0%
622189
1.0%
612598
1.2%
602397
1.1%
592364
1.1%
582453
1.1%
572109
1.0%
562276
1.1%

Income (USD/Month)
Real number (ℝ≥0)

Distinct17824
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15099.1665
Minimum2007
Maximum34996
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:46.202619image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2007
5-th percentile3238
Q18495
median14737
Q321049
95-th percentile29694
Maximum34996
Range32989
Interquartile range (IQR)12554

Descriptive statistics

Standard deviation7976.343733
Coefficient of variation (CV)0.528263844
Kurtosis-0.6608618242
Mean15099.1665
Median Absolute Deviation (MAD)6282
Skewness0.3022976191
Sum3241473964
Variance63622059.35
MonotocityNot monotonic
2021-02-27T23:15:46.386131image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8899110
 
0.1%
22525109
 
0.1%
9797100
 
< 0.1%
1613799
 
< 0.1%
1758098
 
< 0.1%
1341395
 
< 0.1%
1925695
 
< 0.1%
1651293
 
< 0.1%
986690
 
< 0.1%
2088488
 
< 0.1%
Other values (17814)213702
99.5%
ValueCountFrequency (%)
200716
 
< 0.1%
20091
 
< 0.1%
20101
 
< 0.1%
20111
 
< 0.1%
201267
< 0.1%
20133
 
< 0.1%
20152
 
< 0.1%
20172
 
< 0.1%
201917
 
< 0.1%
20205
 
< 0.1%
ValueCountFrequency (%)
349962
 
< 0.1%
349952
 
< 0.1%
3498923
< 0.1%
3498514
< 0.1%
349849
 
< 0.1%
349832
 
< 0.1%
349731
 
< 0.1%
349727
 
< 0.1%
3496811
< 0.1%
3496713
< 0.1%

Population
Real number (ℝ≥0)

HIGH CORRELATION

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4152714.67
Minimum248968
Maximum8405837
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:46.590581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum248968
5-th percentile248968
Q11595037
median1955130
Q38405837
95-th percentile8405837
Maximum8405837
Range8156869
Interquartile range (IQR)6810800

Descriptive statistics

Standard deviation3511699.739
Coefficient of variation (CV)0.8456395438
Kurtosis-1.791341632
Mean4152714.67
Median Absolute Deviation (MAD)1706162
Skewness0.3407323926
Sum8.915006326 × 1011
Variance1.233203506 × 1013
MonotocityNot monotonic
2021-02-27T23:15:46.747164image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
840583785918
40.0%
195513047264
22.0%
159503728168
 
13.1%
24896824506
 
11.4%
8148855795
 
2.7%
9429085637
 
2.6%
13391554452
 
2.1%
6983713028
 
1.4%
10301852469
 
1.2%
7542332431
 
1.1%
Other values (5)5011
 
2.3%
ValueCountFrequency (%)
24896824506
11.4%
3272251169
 
0.5%
542085631
 
0.3%
5457761033
 
0.5%
6983713028
 
1.4%
7542332431
 
1.1%
8148855795
 
2.7%
9429085637
 
2.6%
9439991200
 
0.6%
959307978
 
0.5%
ValueCountFrequency (%)
840583785918
40.0%
195513047264
22.0%
159503728168
 
13.1%
13391554452
 
2.1%
10301852469
 
1.2%
959307978
 
0.5%
9439991200
 
0.6%
9429085637
 
2.6%
8148855795
 
2.7%
7542332431
 
1.1%

Users
Real number (ℝ≥0)

HIGH CORRELATION

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean187745.1734
Minimum3643
Maximum302149
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:46.916711image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum3643
5-th percentile14978
Q1144132
median164468
Q3302149
95-th percentile302149
Maximum302149
Range298506
Interquartile range (IQR)158017

Descriptive statistics

Standard deviation103764.7419
Coefficient of variation (CV)0.5526892651
Kurtosis-1.312639547
Mean187745.1734
Median Absolute Deviation (MAD)137681
Skewness-0.187138635
Sum4.030494609 × 1010
Variance1.076712167 × 1010
MonotocityNot monotonic
2021-02-27T23:15:47.052348image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
30214985918
40.0%
16446847264
22.0%
14413228168
 
13.1%
8002124506
 
11.4%
247015795
 
2.7%
221575637
 
2.6%
176754452
 
2.1%
149783028
 
1.4%
129942469
 
1.2%
124212431
 
1.1%
Other values (5)5011
 
2.3%
ValueCountFrequency (%)
3643631
 
0.3%
61331200
 
0.6%
70441033
 
0.5%
92701169
 
0.5%
124212431
1.1%
129942469
1.2%
149783028
1.4%
176754452
2.1%
221575637
2.6%
247015795
2.7%
ValueCountFrequency (%)
30214985918
40.0%
16446847264
22.0%
14413228168
 
13.1%
8002124506
 
11.4%
69995978
 
0.5%
247015795
 
2.7%
221575637
 
2.6%
176754452
 
2.1%
149783028
 
1.4%
129942469
 
1.2%

Holiday
Categorical

HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
-
210758 
Christmas Day
 
735
Thanksgiving Day
 
624
Veterans Day
 
506
Labor Day
 
380
Other values (6)
 
1676

Length

Max length37
Median length1
Mean length1.261278467
Min length1

Characters and Unicode

Total characters270770
Distinct characters40
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
-210758
98.2%
Christmas Day735
 
0.3%
Thanksgiving Day624
 
0.3%
Veterans Day506
 
0.2%
Labor Day380
 
0.2%
Columbus Day362
 
0.2%
Independence Day340
 
0.2%
Memorial Day335
 
0.2%
Presidents Day (Washingtons Birthday)251
 
0.1%
Martin Luther King Jr. Day231
 
0.1%
2021-02-27T23:15:47.414378image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
210758
95.8%
day3921
 
1.8%
christmas735
 
0.3%
thanksgiving624
 
0.3%
veterans506
 
0.2%
labor380
 
0.2%
columbus362
 
0.2%
independence340
 
0.2%
memorial335
 
0.2%
washingtons251
 
0.1%
Other values (8)1740
 
0.8%

Most occurring characters

ValueCountFrequency (%)
-210758
77.8%
a7391
 
2.7%
5273
 
1.9%
y4172
 
1.5%
n3989
 
1.5%
s3966
 
1.5%
D3921
 
1.4%
e3754
 
1.4%
i3533
 
1.3%
r3308
 
1.2%
Other values (30)20705
 
7.6%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation210758
77.8%
Lowercase Letter44812
 
16.5%
Uppercase Letter9194
 
3.4%
Space Separator5273
 
1.9%
Open Punctuation251
 
0.1%
Close Punctuation251
 
0.1%
Other Punctuation231
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
a7391
16.5%
y4172
9.3%
n3989
8.9%
s3966
8.9%
e3754
8.4%
i3533
7.9%
r3308
 
7.4%
t2456
 
5.5%
h2092
 
4.7%
g1730
 
3.9%
Other values (11)8421
18.8%
ValueCountFrequency (%)
D3921
42.6%
C1097
 
11.9%
T624
 
6.8%
L611
 
6.6%
M566
 
6.2%
V506
 
5.5%
I340
 
3.7%
P251
 
2.7%
W251
 
2.7%
B251
 
2.7%
Other values (4)776
 
8.4%
ValueCountFrequency (%)
-210758
100.0%
ValueCountFrequency (%)
5273
100.0%
ValueCountFrequency (%)
.231
100.0%
ValueCountFrequency (%)
(251
100.0%
ValueCountFrequency (%)
)251
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common216764
80.1%
Latin54006
 
19.9%

Most frequent character per script

ValueCountFrequency (%)
a7391
13.7%
y4172
 
7.7%
n3989
 
7.4%
s3966
 
7.3%
D3921
 
7.3%
e3754
 
7.0%
i3533
 
6.5%
r3308
 
6.1%
t2456
 
4.5%
h2092
 
3.9%
Other values (25)15424
28.6%
ValueCountFrequency (%)
-210758
97.2%
5273
 
2.4%
(251
 
0.1%
)251
 
0.1%
.231
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII270770
100.0%

Most frequent character per block

ValueCountFrequency (%)
-210758
77.8%
a7391
 
2.7%
5273
 
1.9%
y4172
 
1.5%
n3989
 
1.5%
s3966
 
1.5%
D3921
 
1.4%
e3754
 
1.4%
i3533
 
1.3%
r3308
 
1.2%
Other values (30)20705
 
7.6%

Profit
Real number (ℝ)

Distinct201424
Distinct (%)93.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean178.5381653
Minimum-160.714
Maximum1463.966
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:47.711585image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-160.714
5-th percentile0.72796
Q142.159
median117.5772
Q3263.699
95-th percentile556.4564
Maximum1463.966
Range1624.68
Interquartile range (IQR)221.54

Descriptive statistics

Standard deviation183.2411271
Coefficient of variation (CV)1.026341492
Kurtosis2.317422415
Mean178.5381653
Median Absolute Deviation (MAD)90.9692
Skewness1.479546876
Sum38328394.8
Variance33577.31065
MonotocityNot monotonic
2021-02-27T23:15:47.909056image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
83.198
 
< 0.1%
4.757
 
< 0.1%
67.886
 
< 0.1%
15.186
 
< 0.1%
20.256
 
< 0.1%
31.756
 
< 0.1%
50.146
 
< 0.1%
14.316
 
< 0.1%
12.256
 
< 0.1%
113.636
 
< 0.1%
Other values (201414)214616
> 99.9%
ValueCountFrequency (%)
-160.7141
< 0.1%
-145.94681
< 0.1%
-144.76641
< 0.1%
-144.44641
< 0.1%
-135.87521
< 0.1%
-134.741
< 0.1%
-134.2041
< 0.1%
-133.6821
< 0.1%
-133.6721
< 0.1%
-133.2081
< 0.1%
ValueCountFrequency (%)
1463.9661
< 0.1%
1445.2721
< 0.1%
1433.3421
< 0.1%
1424.14081
< 0.1%
1408.3441
< 0.1%
1408.02521
< 0.1%
1399.111
< 0.1%
1390.44641
< 0.1%
1371.6261
< 0.1%
1338.921
< 0.1%

Year
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.9 MiB
2017.0
77086 
2018.0
73449 
2016.0
64144 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters1288074
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016.0
2nd row2016.0
3rd row2016.0
4th row2016.0
5th row2016.0
ValueCountFrequency (%)
2017.077086
35.9%
2018.073449
34.2%
2016.064144
29.9%
2021-02-27T23:15:48.306995image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:15:48.411712image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
2017.077086
35.9%
2018.073449
34.2%
2016.064144
29.9%

Most occurring characters

ValueCountFrequency (%)
0429358
33.3%
2214679
16.7%
1214679
16.7%
.214679
16.7%
777086
 
6.0%
873449
 
5.7%
664144
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1073395
83.3%
Other Punctuation214679
 
16.7%

Most frequent character per category

ValueCountFrequency (%)
0429358
40.0%
2214679
20.0%
1214679
20.0%
777086
 
7.2%
873449
 
6.8%
664144
 
6.0%
ValueCountFrequency (%)
.214679
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1288074
100.0%

Most frequent character per script

ValueCountFrequency (%)
0429358
33.3%
2214679
16.7%
1214679
16.7%
.214679
16.7%
777086
 
6.0%
873449
 
5.7%
664144
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1288074
100.0%

Most frequent character per block

ValueCountFrequency (%)
0429358
33.3%
2214679
16.7%
1214679
16.7%
.214679
16.7%
777086
 
6.0%
873449
 
5.7%
664144
 
5.0%

Month
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.469282044
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size1.6 MiB
2021-02-27T23:15:48.544358image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median8
Q311
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.470552254
Coefficient of variation (CV)0.4646433531
Kurtosis-1.085846652
Mean7.469282044
Median Absolute Deviation (MAD)3
Skewness-0.3797205204
Sum1603498
Variance12.04473295
MonotocityNot monotonic
2021-02-27T23:15:48.685978image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1228617
13.3%
1125343
11.8%
1024013
11.2%
921442
10.0%
818303
8.5%
716188
7.5%
614516
6.8%
514406
6.7%
113975
6.5%
313264
6.2%
Other values (2)24612
11.5%
ValueCountFrequency (%)
113975
6.5%
211434
5.3%
313264
6.2%
413178
6.1%
514406
6.7%
614516
6.8%
716188
7.5%
818303
8.5%
921442
10.0%
1024013
11.2%
ValueCountFrequency (%)
1228617
13.3%
1125343
11.8%
1024013
11.2%
921442
10.0%
818303
8.5%
716188
7.5%
614516
6.8%
514406
6.7%
413178
6.1%
313264
6.2%

Day of Week
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size13.1 MiB
Friday
48528 
Saturday
46892 
Sunday
42156 
Thursday
23503 
Wednesday
18031 
Other values (2)
35569 

Length

Max length9
Median length6
Mean length6.990525389
Min length6

Characters and Unicode

Total characters1500719
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSaturday
2nd rowSaturday
3rd rowSaturday
4th rowSaturday
5th rowSaturday
ValueCountFrequency (%)
Friday48528
22.6%
Saturday46892
21.8%
Sunday42156
19.6%
Thursday23503
10.9%
Wednesday18031
 
8.4%
Monday17807
 
8.3%
Tuesday17762
 
8.3%
2021-02-27T23:15:49.053994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:15:49.180655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
friday48528
22.6%
saturday46892
21.8%
sunday42156
19.6%
thursday23503
10.9%
wednesday18031
 
8.4%
monday17807
 
8.3%
tuesday17762
 
8.3%

Most occurring characters

ValueCountFrequency (%)
a261571
17.4%
d232710
15.5%
y214679
14.3%
u130313
8.7%
r118923
7.9%
S89048
 
5.9%
n77994
 
5.2%
s59296
 
4.0%
e53824
 
3.6%
F48528
 
3.2%
Other values (7)213833
14.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1286040
85.7%
Uppercase Letter214679
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
a261571
20.3%
d232710
18.1%
y214679
16.7%
u130313
10.1%
r118923
9.2%
n77994
 
6.1%
s59296
 
4.6%
e53824
 
4.2%
i48528
 
3.8%
t46892
 
3.6%
Other values (2)41310
 
3.2%
ValueCountFrequency (%)
S89048
41.5%
F48528
22.6%
T41265
19.2%
W18031
 
8.4%
M17807
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
Latin1500719
100.0%

Most frequent character per script

ValueCountFrequency (%)
a261571
17.4%
d232710
15.5%
y214679
14.3%
u130313
8.7%
r118923
7.9%
S89048
 
5.9%
n77994
 
5.2%
s59296
 
4.0%
e53824
 
3.6%
F48528
 
3.2%
Other values (7)213833
14.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1500719
100.0%

Most frequent character per block

ValueCountFrequency (%)
a261571
17.4%
d232710
15.5%
y214679
14.3%
u130313
8.7%
r118923
7.9%
S89048
 
5.9%
n77994
 
5.2%
s59296
 
4.0%
e53824
 
3.6%
F48528
 
3.2%
Other values (7)213833
14.2%

Interactions

2021-02-27T23:14:58.392477image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:14:58.703646image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:14:58.997857image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:14:59.300048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:14:59.595261image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:14:59.890473image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:00.175708image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:00.468925image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:00.771115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:01.064339image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:01.366522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:01.642783image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:01.943978image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:02.208284image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:02.638119image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:02.917377image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:03.194634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:03.451944image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:03.728208image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:04.011446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:04.280729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:04.555990image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:04.817292image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:05.093553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:05.359842image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:05.643085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:05.922340image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:06.196606image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:06.457905image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:06.740152image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:07.020398image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:07.293673image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:07.569931image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:07.833237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:08.131427image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:08.418664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:08.776703image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:09.202564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:09.559608image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:09.915660image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:10.307613image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:10.609800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:10.909001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:11.206225image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:11.490448image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:11.784657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:12.067899image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:12.348149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:12.653364image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:12.967496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:13.243753image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:13.539961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:13.841156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:14.134371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:14.422602image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:14.704848image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:14.991081image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:15.274326image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:15.561557image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:16.111085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:16.396324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:16.681559image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:16.969789image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:17.287937image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:17.585144image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:17.871376image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:18.157612image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:18.422903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:18.690193image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:18.948496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:19.229745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:19.503014image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:19.770299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:20.040581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:20.314843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:20.583126image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:20.850415image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:21.100742image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:21.388968image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:21.676204image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:22.137967image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:22.435172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:22.730381image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:23.118345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:23.395602image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:23.696811image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:23.990013image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:24.278242image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:24.548517image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:24.849724image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:25.164885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:25.451104image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:25.771248image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:26.078428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:26.376630image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:26.673833image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:26.974031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:27.276224image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:27.574430image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:27.859663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:28.144901image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:28.431135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:28.709391image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:29.008589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:29.293828image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:29.576070image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:29.840390image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:30.131585image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:30.425803image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:30.716022image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:30.982309image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:31.281517image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:31.570736image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:31.869946image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:32.180107image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:32.475317image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:32.769538image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:33.049779image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:33.344994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:33.652169image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:33.939402image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:34.214667image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:34.492921image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:34.760207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:35.019513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:35.298767image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:35.582010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:35.850291image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:36.105609image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:36.377877image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:36.658130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:15:36.932398image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-27T23:15:49.407058image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-27T23:15:49.758112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-27T23:15:50.111166image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-27T23:15:50.470205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-02-27T23:15:50.843208image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-27T23:15:38.115240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-27T23:15:39.362916image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexTransaction IDDate of TravelCompanyCityKM TravelledPrice ChargedCost of TripCustomer IDPayment_ModeGenderAgeIncome (USD/Month)PopulationUsersHolidayProfitYearMonthDay of Week
0010000429.02016-01-02Yellow CabBOSTON MA15.15342.62205.434057474.0CardMale34.016558.0248968.080021.0-137.18602016.01.0Saturday
1110000525.02016-01-02Yellow CabCHICAGO IL2.1851.4726.42164551.0CashMale19.06316.01955130.0164468.0-25.04842016.01.0Saturday
2410000927.02016-01-02Yellow CabNEW YORK NY34.561121.11485.22241808.0CashMale59.018999.08405837.0302149.0-635.88762016.01.0Saturday
3510000721.02016-01-02Yellow CabLOS ANGELES CA19.20529.23246.52808117.0CashMale21.05946.01595037.0144132.0-282.70202016.01.0Saturday
4710000519.02016-01-02Yellow CabCHICAGO IL13.92327.23185.41444429.0CashMale20.023387.01955130.0164468.0-141.81562016.01.0Saturday
51010000806.02016-01-02Yellow CabNEW YORK NY10.62347.48138.90962153.0CardMale18.08193.08405837.0302149.0-208.57042016.01.0Saturday
61110001009.02016-01-02Yellow CabPHOENIX AZ30.001000.52403.200021481.0CardMale28.018030.0943999.06133.0-597.32002016.01.0Saturday
71210000516.02016-01-02Yellow CabCHICAGO IL4.72105.7962.87045803.0CardMale54.04964.01955130.0164468.0-42.91962016.01.0Saturday
81310000663.02016-01-02Yellow CabDALLAS TX12.98382.31165.105626299.0CardMale37.02729.0942908.022157.0-217.20442016.01.0Saturday
91610000906.02016-01-02Yellow CabNEW YORK NY19.57552.86248.93042682.0CardMale36.015280.08405837.0302149.0-303.92962016.01.0Saturday

Last rows

df_indexTransaction IDDate of TravelCompanyCityKM TravelledPrice ChargedCost of TripCustomer IDPayment_ModeGenderAgeIncome (USD/Month)PopulationUsersHolidayProfitYearMonthDay of Week
21466935937110435429.02018-12-31Yellow CabNEW YORK NY22.44572.95301.59361402.0CardFemale57.08870.08405837.0302149.0-271.35642018.012.0Monday
21467035937210437882.02018-12-31Yellow CabCHICAGO IL18.72247.24226.88645633.0CashMale49.023206.01955130.0164468.0-20.35362018.012.0Monday
21467135937310434558.02018-12-31Yellow CabCHICAGO IL35.34471.81496.17365085.0CardFemale33.010274.01955130.0164468.0--24.36362018.012.0Monday
21467235937410435235.02018-12-31Yellow CabNEW YORK NY3.8082.5253.35201762.0CardMale62.04016.08405837.0302149.0-29.16802018.012.0Monday
21467335937510434288.02018-12-31Yellow CabBOSTON MA2.2631.3728.204859187.0CashFemale52.013751.0248968.080021.0-3.16522018.012.0Monday
21467435937610434486.02018-12-31Yellow CabCHICAGO IL38.11558.03484.75925727.0CardFemale22.02106.01955130.0164468.0-73.27082018.012.0Monday
21467535937710435871.02018-12-31Yellow CabORANGE COUNTY15.34247.86185.920816342.0CardFemale23.02677.01030185.012994.0-61.93922018.012.0Monday
21467635937810438475.02018-12-31Yellow CabLOS ANGELES CA14.40219.43179.71207237.0CardFemale18.020571.01595037.0144132.0-39.71802018.012.0Monday
21467735938210438162.02018-12-31Yellow CabCHICAGO IL34.72472.05433.30564263.0CardMale36.019488.01955130.0164468.0-38.74442018.012.0Monday
21467835939110434637.02018-12-31Yellow CabCHICAGO IL3.3644.8643.94885926.0CardMale58.031841.01955130.0164468.0-0.91122018.012.0Monday